Cunha towards discourse parsing in Spanish
نویسنده
چکیده
texts can be analysed from different perspectives. one of the most difficult phenomena to process is discourse structure (hovy 2010). in recent years, one of the main challenges in the field of natural language processing (nlp) has been discourse parsing. research on this topic has been done for several languages, such as Japanese (Sumita et al. 1992), english (marcu 2000) and portuguese (pardo 2008), among others. also, for english, the conll-2015 Shared task focused on Shallow discourse parsing.9 discourse annotated corpora have been created too, for example for english (carlson et al. 2002), German (Stede 2004), portuguese (pardo 2008) and french (afantenos 2012). discourse parsing tools and resources are used to develop nlp applications; for example, automatic summarization, information extraction, text generation, machine translation and sentiment analysis (taboada & mann 2004). The aim of this paper is to present the advances in discourse parsing for Spanish. Specifically, after explaining our theoretical framework, we will detail the tools we have developed for the automatic annotation of discourse information in texts in Spanish and the discourse annotated resources we have created.
منابع مشابه
A Symbolic Corpus-based Approach to Detect and Solve the Ambiguity of Discourse Markers
At present, discourse parsing is an important research topic. Rhetorical Structure Theory (RST) is one of the most popular approaches in this field. In general, discourse parsing includes three stages: discourse segmentation, discourse relations detection and building up rhetorical trees. Different strategies are used when developing discourse parsers. One of the strategies to detect discourse ...
متن کاملDiSeg: Un segmentador discursivo automático para el español
Nowadays discourse parsing is a very prominent research topic. However, there is not a discourse parser for Spanish texts. The first stage in order to develop this tool is discourse segmentation. In this work, we present DiSeg, the first discourse segmenter for Spanish that uses the framework of the Rhetorical Structure Theory (Mann and Thompson, 1988) and is based on lexical and syntactic rule...
متن کاملExtending Automatic Discourse Segmentation for Texts in Spanish to Catalan
At present, automatic discourse analysis is a relevant research topic in the field of NLP. However, discourse is one of the phenomena most difficult to process. Although discourse parsers have been already developed for several languages, this tool does not exist for Catalan. In order to implement this kind of parser, the first step is to develop a discourse segmenter. In this article we presen...
متن کاملDiscourse Segmentation for Sentence Compression
Earlier studies have raised the possibility of summarizing at the level of the sentence. This simplification should help in adapting textual content in a limited space. Therefore, sentence compression is an important resource for automatic summarization systems. However, there are few studies that consider sentence-level discourse segmentation for compression task; to our knowledge, none in Spa...
متن کاملCross-lingual RST Discourse Parsing
Discourse parsing is an integral part of understanding information flow and argumentative structure in documents. Most previous research has focused on inducing and evaluating models from the English RST Discourse Treebank. However, discourse treebanks for other languages exist, including Spanish, German, Basque, Dutch and Brazilian Portuguese. The treebanks share the same underlying linguistic...
متن کامل